191 research outputs found

    Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package

    Get PDF
    This article presents the PST R package for categorical sequence analysis with probabilistic suffix trees (PSTs), i.e., structures that store variable-length Markov chains (VLMCs). VLMCs allow to model high-order dependencies in categorical sequences with parsimonious models based on simple estimation procedures. The package is specifically adapted to the field of social sciences, as it allows for VLMC models to be learned from sets of individual sequences possibly containing missing values; in addition, the package is extended to account for case weights. This article describes how a VLMC model is learned from one or more categorical sequences and stored in a PST. The PST can then be used for sequence prediction, i.e., to assign a probability to whole observed or artificial sequences. This feature supports data mining applications such as the extraction of typical patterns and outliers. This article also introduces original visualization tools for both the model and the outcomes of sequence prediction. Other features such as functions for pattern mining and artificial sequence generation are described as well. The PST package also allows for the computation of probabilistic divergence between two models and the fitting of segmented VLMCs, where sub-models fitted to distinct strata of the learning sample are stored in a single PST

    Analyzing and Visualizing State Sequences in R with TraMineR

    Get PDF
    This article describes the many capabilities offered by the TraMineR toolbox for categorical sequence data. It focuses more specifically on the analysis and rendering of state sequences. Addressed features include the description of sets of sequences by means of transversal aggregated views, the computation of longitudinal characteristics of individual sequences and the measure of pairwise dissimilarities. Special emphasis is put on the multiple ways of visualizing sequences. The core element of the package is the state se- quence object in which we store the set of sequences together with attributes such as the alphabet, state labels and the color palette. The functions can then easily retrieve this information to ensure presentation homogeneity across all printed and graphical displays. The article also demonstrates how TraMineRâÂÂs outcomes give access to advanced analyses such as clustering and statistical modeling of sequence data.

    Using dynamic microsimulation to understand professional trajectories of the active Swiss population

    Get PDF
    Within the social and economic sciences and of particular interest to demographers are life course events. Looking at life sequences we can better understand which states, or life events, precede or are precursors to vulnerability. A tool that has been used for policy evaluation and recently has been gaining ground in life course sequence simulation is dynamic microsimulation. Within this context dynamic microsimulation consists in generating entire life courses from the observation of portions of the trajectories of individuals of different ages. In this work, we aim to use dynamic microsimulation in order to analyse individual professional trajectories with a focus on vulnerability. The primary goal of this analysis is to deepen upon current literature by providing insight from a longitudinal perspective on the signs of work instability and the process of precarity. The secondary goal of this work which is to show how, by using microsimulation, data collected for one purpose can be analysed under a different scope and used in a meaningful way. The data to be used in this analysis are longitudinal and were collected by NCCR-LIVES IP207 under the supervision of Prof. Christian Maggiori and Dr. Gregoire Bollmann. Individuals aged 25 to 55 residing in the German-speaking and French-speaking regions of Switzerland were followed annually for four years. These individuals were questioned regarding, amongst their personal, professional and overall situations and well-being. At the end of the fourth wave, there were 1131 individuals who had participated in all waves. The sample remained representative of the Swiss population with women and the unemployed slightly over represented. Using the information collected from these surveys, we use simulation to construct various longitudinal data modules where each data module represents a specific life domain. We postulate the relationship between these modules and layout a framework of estimation. Within certain data modules a set of equations are created to model the process therein. For every dynamic (time-variant) data module, such as the labour-market module, the transition probabilities between states (ex. labour market status) are estimated using a Markov model and then the possible outcomes are simulated. The benefit of using dynamic microsimulation is that longitudinal sample observations instead of stylised profiles are used to model population dynamics. This is one of the main reasons large-scale dynamic microsimulation models are employed by many developed nations. There has been limited use, however, of such approaches with Swiss data. This work contributes to the analysis of professional trajectories of the active Swiss population by utilising dynamic microsimulation methods

    Siblings in a (neo-)Malthusian town: from cross-sectional to longitudinal perspectives

    Full text link
    'Der Beitrag untersucht eine Datenbank, die aus sechs Volkszählungen, durchgeführt in Genf zwischen 1816 und 1843, erstellt wurde. Die Verfasser betrachten Kohabitationsstrukturen aus einer Geschwisterperspektive. Zuerst wird gezeigt, bis zu welchem Ausmaß Querschnittsdaten über Muster von Lebensweisen informieren können. Zweitens untersuchen die Verfasser die Übergänge von einem Geschwisterstatus zum nächsten innerhalb von sechs Jahren und die Auswirkungen von verschiedenen demographischen, familiären und sozialen Variablen bei Übergangswahrscheinlichkeiten. Ergebnisse zeigen, wie das Leben von Geschwistern von den Interaktionen zwischen einem (neo-)malthusischem demographischem Regime und einem nuklearem Verhalten ebenso eingerahmt wurde wie von der Koexistenz zweier Systeme, das Zuhause zu verlassen: das sozial differierende System von Geschwistern, die in urbanen Familien aufwuchsen und von solchen Kindern aus ländlichen Familien, die während ihres Lebenszyklus durch Genf kamen.' (Autorenreferat)'This paper explores a data base constructed from six population censuses organized in the city of Geneva between 1816 and 1843. The author's look at cohabitation structures in a sibling perspective. First, the author's show to which extent cross-sectional data can inform about life course patterns. Second, the author's examine the transitions from one sibling status to another in the next 6 years, and the effect of several demographic, familial, and social variables on transition probabilities. Results show how the life of siblings was framed by the interactions between a (neo-)Malthusian demographic regime and a nuclear family system. Population heterogeneity resulted from the social importance of statistically marginal behaviors, as well as from the coexistence of two systems of leaving home: the socially differentiated one of the siblings who grew up in urban families, and another one of children from rural families who went through Geneva during their period of life cycle service.' (author's abstract

    A discussion on hidden Markov models for life course data

    Get PDF
    This is an introduction on discrete-time Hidden Markov models (HMM) for longitudinal data analysis in population and life course studies. In the Markovian perspective, life trajectories are considered as the result of a stochastic process in which the probability of occurrence of a particular state or event depends on the sequence of states observed so far. Markovian models are used to analyze the transition process between successive states. Starting from the traditional formulation of a first-order discrete-time Markov chain where each state is liked to the next one, we present the hidden Markov models where the current response is driven by a latent variable that follows a Markov process. The paper presents also a simple way of handling categorical covariates to capture the effect of external factors on the transition probabilities and existing software are briefly overviewed. Empirical illustrations using data on self reported health demonstrate the relevance of the different extensions for life course analysis

    Coefficient-Wise Tree-Based Varying Coefficient Regression with vcrpart

    Get PDF
    The tree-based TVCM algorithm and its implementation in the R package vcrpart are introduced for generalized linear models. The purpose of TVCM is to learn whether and how the coefficients of a regression model vary by moderating variables. A separate partition is built for each potentially varying coefficient, allowing the user to specify coefficient-specific sets of potential moderators, and allowing the algorithm to select moderators individually by coefficient. In addition to describing the algorithm, the TVCM is evaluated using a benchmark comparison and a simulation study and the R commands are demonstrated by means of empirical applications

    Analyzing and Visualizing State Sequences in R with TraMineR

    Get PDF
    This article describes the many capabilities offered by the TraMineR toolbox for categorical sequence data. It focuses more specifically on the analysis and rendering of state sequences. Addressed features include the description of sets of sequences by means of transversal aggregated views, the computation of longitudinal characteristics of individual sequences and the measure of pairwise dissimilarities. Special emphasis is put on the multiple ways of visualizing sequences. The core element of the package is the state se- quence object in which we store the set of sequences together with attributes such as the alphabet, state labels and the color palette. The functions can then easily retrieve this information to ensure presentation homogeneity across all printed and graphical displays. The article also demonstrates how TraMineR’s outcomes give access to advanced analyses such as clustering and statistical modeling of sequence data
    corecore